Elastic temporal alignment for few‐shot action recognition

نویسندگان

چکیده

Few-shot action recognition aims to learn a classification model with good generalisation ability when trained only few labelled videos. However, it is difficult discriminative feature representations for videos in such setting. The Elastic Temporal Alignment (ETA) few-shot proposed. First, convolutional neural network employed extract of video frames sparsely sampled from In order obtain the similarity two videos, temporal alignment estimation function utilised estimate matching score between each pair through an elastic mechanism. analysis shows that we judge whether respective are matched, multiple adjacent should be considered, so as embody information. Thus, before feeding per-frame vectors into function, message passing leveraged propagate information features domain. method has been evaluated on four datasets, including Kinetics, Something-Something V2, HMDB51, and UCF101. experimental results verify effectiveness ETA show its superiority over state-of-the-art methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Appendix: Asynchronous Temporal Fields for Action Recognition

1.1. Description of the CRF We create a CRF which predicts activity, object, etc., for every frame in the video. For reasoning about time, we create a fully-connected temporal CRF, referred to as Asynchronous Temporal Field in the text. That is, unlike a linear-chain CRF for temporal modelling (the discriminative counterpart to Hidden Markov Models), each node depends on the state of every othe...

متن کامل

Spatio-temporal SURF for Human Action Recognition

In this paper, we propose a new spatio-temporal descriptor called ST-SURF. The latter is based on a novel combination between the speed up robust feature and the optical flow. The Hessian detector is employed to find all interest points. To reduce the computation time, we propose a new methodology for video segmentation, in Frames Packets FPs, based on the interest points trajectory tracking. W...

متن کامل

Learning Representative Temporal Features for Action Recognition

in this paper we present a novel video classification methodology that aims to recognize different categories of third-person videos efficiently. The idea is to tracking motion in videos and extracting both short-term and long-term features from motion time series by training a multichannel one dimensional Convolutional Neural Network (1DCNN). The positive point about our method is that we only...

متن کامل

Second-order Temporal Pooling for Action Recognition

Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...

متن کامل

Long-term Temporal Convolutions for Action Recognition

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Iet Computer Vision

سال: 2022

ISSN: ['1751-9632', '1751-9640']

DOI: https://doi.org/10.1049/cvi2.12127